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ABSTRACT 

This paper reviews recent research in the Netherlands 
on the application of decision theory to test-based decision making 
about personnel selection and student placement. The review is based 
on an earlier model proposed for the classification of decision 
problems, and emphasizes an empirical Bayesien framework. 
Classification decisions with threshold utility are discussed to 
provide an example of the application of Bayeisian theory to 
test-based decision making.- Test results from the 1981 administration 
of the Eindtoets Basisondervi js are analyzed with respect to the type 
of secondary education chosen by Dutch students at the end of primary 
education: lower vocational education, lower general education, or 
middle general education. A 55 item bibliography is attached. 
(GDC) 
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Abstract 



This pa per, -reviews recent research in the Netherlands" on the application 

of decision theory, to. test-based decision making. The* review is based on 

■ • * I 

a classification of decision problems proposed in van der Linden (1985a) 

anfl. emphasizes an empirical Bayesian framework. As a more specific 
example of the application of Bayesian theory to test-based decision 
making the ^r^blem of classification decisions with thr«4hold utility is 
discussed. v . . ' ' 
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Advances in the Application of. Decision Theory 

« "* ** ^ 

to Test-based Decision Making x * 

I 

* 

Historically, the use of psychological an* educational -est,s has it*. 

roots In the necessity of selection and placement decisions in public 

ft ' - 

domains^ as education, the army, and the government. This is excellently 
demonstrated in DuBois' (1970) historiography of such cases as Binet's' 

early work on test development for the assignment of pupils to special 

\ * . 

education, the testing of conscripts for placement in the army during' 

EWorld War I, and the examination of applicants fbr the civil service in 

t . anpient China* It is no coincidence that in each of these domains 

decision making is characterized both by. a high visibility and massive 

\ " . ' 

numbers of subjects. In such cases.Jt seems perfectly logical to grab at 

' y * 

tests as objective means to base decisions on* If fests had not beerf 

# 

invented for this purpose yet, we would Invent them today. 
\ 

It is conspicuous that, although the practice of test use has its 

roots In decision making, test theory has been developed mainly as a 

theory of measurement . The origins of test theory are in Spearman's 

pioneering work on the unreliability of test scores which laid- the 
r- * 

foundations for the classical test theory as a theory of measurement 
error. Modern item response theory shows the same concern with 
measurement (parameter estimation) and was not conceived as a theory of 
decision making either. History of test theory shows a few 'exceptions , 
though, of which the publication of the T^y^or-Russell (1939) tables, 
and their subsequent influence on the testing literature, and Croftbach 
and Gleser's (1965) well-known monograph deserve special mention. To 
date, the latter has been the first and only monograph attempting to 
q provide tes t-based decision making with a sound theore t 1 cal basis ♦ 
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Recen'ly, however, the situation has changed somewhat ahd some test 
theorists are now seriously Involved In attempts* to model and to 
optimise the use of tests for decision making, most of them using 
_Bayesian decision theory as their frame of reference. The major impetus 
for this concern has come from the introduction of modert^^nstructional 
systems &s Individualized instruction, learning for ©astery, and 
computer-aided instruction. # In such systems there typically is much 
testing for i instructional decision making 'purposed, which^conf ronts 

4 

their developers with the problem of designing and studying optimal 
decision procedures. 

It is th£ goal of this paper to give a short review of recent <work on 

* ■ * * 

the theory of tgst-based decision making in the Netherlands. The 
emphasis on Dutch contributions means that no reference la made tc? the 
mostly excellent work in trtis area in the U.S. as, "for instance, by 
Huynh (1976, 1980a, 1980b) , .Novick and, his associates (e.g., Chuang, 
Chen, & Novick, 1981; Nov4ck «. Lindley, 1978; Novick $ Petersen, 1976/, 
and Wilcox (1976, 1977, 1978, 1979). in the review a typology of test- 
based decision making given in van der. Linden (1985a) is used. The paper 
concludes with the discussion of classification decisions as a more 
specific example of test-based decision making. 



A Classification of Test-'based Decisions 



Each /different type of decision making can be identified as a specific 

— 

configuration of the following elements: 
• (IX A test^ providing the information *the decisions are based on; 
q (2) One or more treatments with respect to which the decisions are made; 
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(3) One or more criteria by which tHe success of the , treatments are 
measured. \ ' • 

« 

As will be illustrated below, using, the'se elements th^ following types 
of decisions can be distinguished^ 

(1) Selection ; ; 

(2) Mastery ; "* / 

(3) placement and 

(4) 'Classification decisions . 

To each^of these types the fo^owing restrictions or refinements can 

N 

apply: ^ 

(1) Quota restrictions . For some treatments the numbers of vacancies are 

f 

^ constrained. % f ♦ 

(2) Multivariate test data . The djecisions are based on *4ata from a whole 

test battery Instead of a single test.' „ 

« 

(3) Multivariate criteria . The success of the treatments is measured by 
multiple criteria.* m 

(4) Subpopulations . The problem of culture-fair decision making arises 
because of the presence of subpopulations reacting dif ferfenti ally^to 
the test items. j 

t { 

A Review of Dutch Decision Theory Research* 
Selection Decisions * x 



In selection problems the decision at s'take is the acceptance or 
rejection of individuals for a treatment. Selection decisions are 
characterized by the fact that the test is administered before the 
treatment takes, place but that the criterion i$ measured afterwards. 

fi * 
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Wel^-knpwrt- examples of selection problems are. the selection of personnel 

* in industry and the admission of students to educational .programs. The 

formal structure of a selection problem is shown in Figure 1. 



•> 



Insert Figure 1 about here 
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^Selection research in the Netherlands has a tradition as long as in 



any other western country., with early publications dating back to the 
20' s (e.g., de Quay, 1925, in van Naerssen, 1963). Most research can be 
considered as applied woek on problems in personnel selection. Examples 
of popular problem^ are research on criterion choice and analysis, test 
and 1 tear selection, validity studies of test batteries, reliability of 
selection Interviews, techniques of job sampling, and the like. Also a 
considerable amount ftf the selection literature has been devoted to 
ethical Issues. Most applications of selection research have been in 
personnel psychology and not in education because Dutch education has 

m 

traditionally been based en a centralized certification system and not 
on entrance selection. A recent exception, however, has been the 
select io^ of students flfr medical programs In higher education. Reviews^ 

0 ' 

of selection research are given in handbooks by ^Hof stee. (1983) and Roe 
(1983)! . • 

As for the test theoretical framework^ adopted In selection research, 
the selection problem has generally been approached as a prediction 
problem In which regression lines or- expectancy tables should be 
employed to predict whether the oriterlon scores of^ndividuals exceed a 
certain threshofci value so that their selection guarantees a success* 
$lbre recent, ^original work .along these lines has been published in which 



Test-based Decision Maki rig 

* ■ ■ * * 6 

.* 

^ * 

J** 

* \ . . • " * 

correction for the restriction of range in the validity iclent*of 
selection procedures are addressed- (Brouwei & Vi jn, 1978; Brouwer & 

/ . 

Vijn, 1979, Roie, 1979). Selection decisions with quota restrictions have 

• 0 *■■ ■ ' v 

long been evaluated with the aid of thgsTaylor-Russell tables, which 
give success ratio's, for a number of /parameters characterizing the 
selection procedure. * 4 . 

r A major breakthrough is selection theory in the Netherlands was 
offered by van Naerssen (1963, 1965a, 1970) who introduced/ the 

'S ■ * 

application of empirical B^yesian decision theory in selection research. 

An extensive introduction to van Naerssen f 8 early work, which arose from 

a case study on the selection of drivers for the army, cari be found is 

his addendum to Cronbath and Gleser's monograph (van Naerssen, 1^65b). 

Among the topics dealt with in van Naerssen (1963) are the computation 

of optimal testing time with a f/ixed selection ratio, the 4 etermlt * a tion 

• . "i 
of optimal selection ratio's, and twb-stage seldfctjon procedures. Van 

Naerssen (1963) also offers some decision theory for a selection problem 

with two subpopulations. Apart from van Naerssen 9 s contributions not 

much work on the selection problem from a' decision-theoretic point of 

view can be found in the Netherlands. A recent exception, however, is a 

paper by Mellenbergh and van der Linden (1982) who give some decision 

theory for quota-free and fixed-quota selection from several 

^^populations with a linear utility structure and illustrate their 

results with an appli cation to /a* culture-fair testing problem. 



Mastery Decisions r 
Unlike selection decisions, mastery decisions are made after the 
treatment has been administered. The decision to be made is whether the 
individuals who have followed the treatment meet its goals or not. A 
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further characteristic Is that in the mastery decision problem the 

* - s . • . 

criterion i* internal to the test and not external. It is. unreliability 
of, the test as a representative o/ the criterion that opens the 
possibility of making wrong decisions and creates, the mastery decision < 
problem^ Examples of mastery decisions are pass^fail and certification,, 
decisions in education, but also, e.jg:, decisions with respect to, 
successfulness of therapies 1 in clinical settings. Figure 2 displays 



Insert Figure 2 about here * " 

/ 

/ . . 

/ 

the formal structure of a mastery decision problem. 

As opposed to the selection problem, research x on the mastery 
decision problem in the Netherlands has been mainly test theoretic with * 

less emphasis on applied Issues. Again it was van Naerssen who took the 

♦ ♦ 

lead and introduced the topi/: and its related problem of the equating of 
mastery standards in a series of papers (1966, 1971, i974a). But now 
others have followed. The following issues have been studied more or 
^less extensively: 

■1* (Empirics!) Bayss decision rules . The problem of Bayes rules for 

mastery decisions with, a binomial error, a beta prior and a. threshold 
loss function has been addressed by Meilenbergh, Koppelaar, and van 
der Linden (1977). Kellenbergh also suggested, the idea of |y linear 
instead of a threshold loss function which has the advantage of being 

* 

continuous in the true score for both the mastery and the^jenmastery 
decision. This idea, was elaborated for the classical test model with 
an unspecified prior in van der Linden and Mellenbergh (1977) , while 
properties of Bayes rule* - t^is problem were studied f^T^et it* 

ft* • " . 
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van der linden (1980,, 1984a, 1984b). Estimation procedures and \ 

, s . \ 

properties^ of Bayes rules for latent state models with threshold loss 

V ' • 

are exploded in van der Linden (1978, 1981a, 1982a, 1983). -\ . 

• • * 

Decision without priors * The above decision rules-are optimal in the 
Bayes. sense for *an emplrlcail population of subjects. Van den Brfnk 
(1982) takes the position that this* id not consonant with the idea of 
absolute measurement and gives various results for mastery testing * 
.under a binomial error model adopting a Neyman-Pearson .framework of - 
, /hypothesis testing. Along the-same lines van den Brink and Koele 

• • • 

(1980) and van der Linden (1982b) have studied the effect of guessing 

on multiple-choice Items on decision rules. Minima* solutions for the 

binomial error model with threshold loss are discussed in Veldhulzen 
* f 

(1982) . 

3. Utility structure . Properties of Bayes rules may depend .heavily on 
the utility ^structure adopted. A usual approach to the utility 

» 

problem Is the subjective one In which the decision theorist adopts a 
family of utility functions that is plausible because It meets some 
obvious formal conditions and the decision jdaker is requested to 
Identify a member of It on intuitive grounds. The assessment of 
utility functions can also be based on empirical methods' as lottery 
or scaling methods. Scaling methods for the mastery decision problem 
have recently been studied by Vrijhof , Mellenbergh, and van den Brink 

(1983) . 

A. Item selection * In nost research reviewed earlier in this section, the 
problem was to derive, under certain ^assumptions, optimal decision 
rules for a piven test. If a domain of items from which the test has 
to be selected is available, another optimization problem arises, 
namely the optimal selection of 4 terns for decision making. Tyo 
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different lines of research can be reported. De Grui jter And 

• o, 

Hamble ton (1983) and Hambleton and de Grui jter (1983) have based 

\ 

-their selection on the value of the item information function at the 

■ * 

mastery standard. Simulation. studies of this selection procedure' 

aglinst random sampling from the sajae domain showed a considerable 

improvement in terms of the percentage of misclassif ication pf 

* * • 

examinees for the resulting test^ The same procedure, but with 

\ r • . 

selection on the first derivative of the item-characteristic curve at 
the standard on the ability scale, was studied earlier In van ' 
Nae^ssen (1977a, 1977bJV Mellenbergh and van der Linden (1982) have 
proposed a different .procedure in which items are selected on the 
basis of their contribution to "the Eaves risk . Thgy were able to show 
under what, conditions Ifhis criterloiy boils 'down to ^selection using 



classical item indices. 
5. Evaluation of decilion procedures . Measurement procedures are usually 
evaluated by their, reliabili ty or estimation accuracy but for 



decision procedures this seems 4 e6s adequate. der Linden and 

v 

Mellenbergh (1978) suggested to uae the $aycs risk for this purpose 
and proposed to standardize this on the interval [0,1] using the risk 
of procedures vjfth test scores having no and full information about 
the critetion as reference points. They also showed under what 
conditions the standardized risk is equal to classical test Indices* 
as, e.g.. the reliability coefficient. In Mellenbergh and van der 
Lindeft (1979) the' same procedure is outlined for test-based decision 
making with an external criterion (e.g. selection decisions). ' 
A different perspective on the evaluation of decision procedures is 
robustness analysis. A concept introduced in Vijn (1980) and explored 
further in Vijn and Mo.lenaar (198J) is that of the robustness region 
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\which Is definfed as the subset Of the parameter space of the decision 
problem giving rise, to the sanje decision rule. An application of 

robustness Region analysis to a mastery decision problem can be found 

* ' v- ■ " . " ' 

in the latter reference.'. s < 

> if Jh * 

6. Standard setting : Mastery decision making supposes the presence of a 
threshold value or standard on the criterion separating the "masters' 1 
fromxthe "nonmasters". A useful standard /setting method is the eo- 
called kernel- item method in which judges indicate Vhich Items 
present the standard best, and next thej standard is computed from qhe 
statistical of these Items (de Groot & van Naerssen, 1975, sect. 19U; 
van Naerssen, 1974b). A proposal accounting for possible uncertainty 

K *or Inaccuracy In standard setting procedures by replacing standards 

by distributions of possible values is elaborated In de Grui jter . 
♦ • * ' / ' 

(1980). That there can be much Inaccuracy in standard setting < 

i 

procedures Is demonstrated in van der Linden (1982c) who used ' 
calculations under an' item response model to check for specification 
errors in the Angoff and Nedelsky methods and found that errors. 

i 

larger thafi .20-.25\*ere no exception. / 

v • . - 

Placement Decisions ^ 

> . — _ ^ % v 

In placement problems several alternative treatments are available and 
it is the decision maker's task to assign, individuals on the, basis of 
their test scores tq the most promising treatment. All individual^ sre« 
administered the same test and the success of each treatment is measured 

sfilecti< 

assigned to a treatment. Figure 3 shows the case of a placement decision 



by the same criterion. Unwike^the selection problem, each individual is 



with two treatments* Examples of placement decisions are in 
Individualized instruction where students -are assigned to different 
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Insert Figure 3 about here 



routes through an instructional unit all leading to the saw objective « 
The traditional approach to the problem Is that of linear regression 

s 

analysis with a separate regression line for each treatment and the 

assignment of individuals to the treatment with the largest predicted 

criterion score* A Bayesian version of this approach is offered in Vijn 

(1980) which offers the option of incorporating previous information in 

placement decisions via the specification of prior distributions for the 

regression parameters. 

VijnV approach, although fully Bayes. J , still views the placement 

problem as u prediction problem. A treatment of placement decisions from 

a decision-theoretic viewpoint is given in van der Linden (1981). This 

paper formalizes the placement decision as an empirical Bayes problem 

with different utility functions and probability models for each 

treatment and gives decision rules for the cases of utility functions 

from the threshold , linear and normal-ogive families. The paper also 

indicates how optimal rules for placement decisions with subpopulations 

« 

can be found. 
Class ification Decisions 

As Is clear from Figure 4, the difference between classification sfhd 

\ 

Insert Figure 4 about hejre-' 
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placement decisions is that in the former each treatment has its own 
cri terion .further these two types of decisions Rave identical 
properties. Examples of classification decisions occur in vocational 
guidance situations when most promising schools or careers must be 
identified. ' 

The most popular .approach to classification decisions has again been 
the use of linear-regression techniques. Ea<?h criterion is then mapped 
on a common utility scale and the decision rule is to assign individuals 
to the treatment with the largest predicted utility. The classification 
problem has hardly been treated as a Bayesian decision problem. As a 
more extensive example of the application of decision theory to test- 
based decision making , tfhe following section discusses the problem of 
classification decisions with threshold utility and illustrates the use 
of a Bayes rule for this case with an empirical application. A full 
treatment of the theory and the application is given separately in van 
der Linden (1985b) where further details can be obtained. 

» 

Classification Decisions with Threshold Utility 

The classification problem can be formalized as follows. There is a 
series of individuals who can be considered to be drawn randomly from 
some population P and must be classified into t+1 treatments indexed by 
j » 0, 1, . t. Each treatment leads to a di f ferent^istribution for P 
on its associated criterion which is denoted by^a^j^idom variable Yj 
with range Rj, w h lc ^ wl *l here be considered to be continuous (although 
in some applications Yj may be discrete). The test scores observed prior 
to the treatment are denoted by a random variable X with discrete values 



Test-based Decision Making 
13 



* 

x ■ 0 ffx and probability function A(x). It is assumed that P #ield 

a joint distribution of test and criterion scores with probability 
(density) function rij(x,yj). 

Suppose that each treatment is followed by a mastery decision 
indicating whether the treatment?has been successful or not. Formally, 
the classification problem can then be represented as a problem with 
threshold utility. An appropriate utility function Is 



<D «• (y ) - 1 

- \ fory^dj 



with 



Wj > Vj for all values of j, 



where dj is the cutting' score on criterion j defining the mastery 
decision rule for treatment j while Wj and Vj are the utilities of 
reaching the mastery am}- nonmastery status, respectively. 

It is assumed that the Bayes rule for this problem has a monotone 
shape; i.e., takes the form c: a aeries of cutting scores on the test 



(2) 0 - c 0 < c^i ... < Cj i ... i c t+1 - n (t £ n) 

/ • 

' c 

such that treatment j Is assigned In the event of Cj < X < c.^ (for 
J • t the second Inequality Is not strict). A necessary ^nd sufficient 
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condition for (2) Is that 



(3) „ Wj - Vj * Wj.j " vj^ 



(j - 1 n) 

4 



and that 



(4) {fljCyjIx)}, 



the conditional distributions of Yj given X ■ x, are stochastically 
Increasing. It Is assumed that the treatments are In proper order 
reflected by their Index - 

From van der Linden (1985) It follows that the expected utility of 
the procedure Is maximal If, for each pair of treatments (J-l*j),. c. Is 
chosen as the smallest value of x for which 



(5) (wj-x-Vj^) ftj^dj^lx) - jfsyV}) "j(d^x) 



Is positive. 

Since the solution of (7) only depends on the difference between Wj 
and Vj and not on their individual values, an interesting case arises if 
it can be assumed that 
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(6) Wj - Vj ■ constant 



&for one of more pairs of ad-jacent treatments. Then (5) reduces to* 



v 



< 7 > Bj-jWj-ilrt-BjWjl*) 



and It Is no longer needed to specify the values of the utility 
parameters. A further special case Is If ftj(yjx),«can be assumed to be a 
location-scale family In which case analytic solutions are*»posslble. For 
these and other cases, see van der Linden (19£$b)» 



An Empirical Example 



The example ±y this section Is derived from a well-known problem In the 
Netherlands, namely the choice of an appropriate continuation-school at 
the end of primary education* Several types of secondary education are 
available running from lower level vocational to university track 
programs* A popular achievement test assisting parents and principals In 
making this choice Is the Elndtoets basl sonderwi js prepared annually by 
the National Institute of Educational Measurement (Cito). In the 
following analyses, data from the 1981 administration of the test arc 
used, and the following types of secondary education are selected as 
treatments: .Lower Vocational Educatlor (LVE) , Lower General Educatibn 
(LGE) , and Middle General Education (MGE). Success on the criterion wVsl 
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for each treatment defined as passing the first year of its program. 

It was assumed that the" probabi li ties of success on Y as a function 
of x cou^d be modeled as a logistic distribution function. Table 1 gives 
the empirical proportions of successes for each treatment. A? only 



Insert iTable 1 about here 
/ — 



grouped data were available logit analysis of th^propoftlons was 
applied for the middles of j^e Intervals reported in the table. The 
bottom 'line of the table shows that the data yielded a nice fit to the 
logi t model. , 

Firstly, it is assumed that (6) holds for the treatments so that (7) 
amounts to a comparison between the logistic regression lines. The 
results are given^in Figure 5 and show that the dominant treatment 



Insert ^Figure 5 about here 



is l6e for almost all possible test scores; only for cest scores below 
x ■ A does the choice of another treatment (LVE) appear to be better. 
Secondly, the sensitivity of the solution in Figure 5 to deviation from 
(6) is analyzed in Table 2. As could be expected from the closeness of 



Insert Table 2 about here 
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the logistic regression lines for LGE and MGE, the cutting score between 
these two treatments Is most sensitive to deviations in the utility 
ratio from unity. The cutting score between LGE and LVE, however, 
appears to be quite robust to changes In the utility ratio. 



Conclusion * 

• • \ : 

This review of research on decision theory , for test-based decision 
making In the Netherlands shows an early Interest In the selection 
problem ajid a subsequent emphasis on the mastery decision problem. 
Recently offshoots to other decision problems have become .visible. It Is 
expected that the Interest In decision making will continue and tha^j; 

9 

more complicated types of decision making (e.g., with quota constraints, 
multivariate test data and criteria, tfhd/or subpopulatlons) will be 
explored soon. 



i 

IS 



Test-based Decision Making 
< 18 



> 



References 



Brouwer, U. & Vljn, P. (1978). De empirische power van 2 procedures ^fh 
"restriction of range" (case 1). Tljdschrlft voor Onderwl jsresegrch , 
2, 106-112. ' ■ X ^ 

Brouwer, U. & Vljn, P. (1979). Bayesladnae schatters voor de 

• correlatlecoefflclent in "restriction of range^ (ge*al 1). Tljdschrlft 
voor Onderwl jsrescarch , 4^ 281-290. 

Chuang, D.T.,»Chen, J.J. & Novick, M.R. (1981). Theory and practice for 
the use of cut-scores for personnel decisions. Journal of Educational 
Statistics , 107^-128. 

Cronbach, L.J. & Gleser, G.L.. (19 65). Psychological tests and personnel 
decisions (2nd ed.). Urbana, 111.: University of Illinois Press. 

de Groot, A.D. & van Na ere sen, R.F. (Eds.) (1973, deel I; 1975, deel 

f 

II). Studletoetsen, const rueren, afnemen, analyseren . Den Haag: 



Iter, D.N.M. (1980), Accounting^f or ugcertajgjy In pej 



Mout^ji. 

de Gruljter, D.N.M. (1980). Accounting^ or uitcerta43ty In performance 
standards * Paper presented at the Fourth International Symposium on 
Educational Testing, Antwerp, Belgium, June 24-27. 
de Gruljter, D.N.M. & Hambie££n, R.K. (1983)* Using Item response models 
In criterion-referenced test Item selection. In R.K. Hambleton (Ed.), 
Applications of Item response theory . Vancouver, British Columbia: 
Educational Research Institute *of British Columbia. . 
Dubois, P*H. (1970). A history of psychological testing * Boston: Allyn & 
Bacon, Ino. 

Hambleton, R.K. & de Gruljter, D.N.M. (1983). Applications of Item 
response models to criterion-referenced test item selection. Journal 
of Educational Measurement , 20, 355-367. 20 V- 



Test-based Decision Making 
19 



Hofstee,- W.K.B. (1983). Selectle . Utrecht, The Netherlands:* Spectrum. 
Huynh, H. (1976)*. Statistical considerations of mastery scores. 

• o • 

Psychometrlka , 41 , 65-79. v 
Huynh, H. (1980). A nonrandomized mlnlmar solution for passing scores In 
the binomial error ^model. Psychometrlka , 45 , 167-182. (a)* , 

fc a .is \ . • v~ 

Huynh, H. (19§0). Statistical Inference for false .positive and false 
negative error rates in mastery testing. Psychometrlka , 45 , 107-120. 
(b) 4 

i « 

Mellenbergh, G.J., Koppelaar, H. & van der Llndeq, U.J. (1977). 

% - * 
Dlchotomous decisions based on dlchotompusly scored Items: A case 

■ study. ^Statistics Neerlandlca-, 31, 161-169. 



Mellenbergh, G.J. & van der Linden, U.J. (1979). The Internal and 

externa/ optlmallty of decisions iased on tests. Applied Psychological 

Measurement , ^, 257-273. 
Mellenbergh, G.J. & van der Linden, U.J , (1981). The linear 'ut 111 ty 

model for optimal selection. Psychometrlka , 46 , 283-293. 
Mellenbergh, G.J. & van der Linden, U.J. (1982). Selecting Items for 

criterion-referenced tests. Evaluation In Education , J5, 117-190. 
Novick, M.R. & Lindley, D.V. (W78). The use of more teallstic utility . 

functions in educational applications. Journal of Educational \ 

Measurement , ,15, 181-191. 
Novick,*M.R. & Petersen, N.S^. (1976). Towards equalizing educational and 

employment opportqnlty. Journal of Educational Measurement , 13 f , 77-88. 
Roe, R.A. (1979). The correction for resection of range and the 

difference between Intended and actual selection. Educational and 

Psychological Measurement , 39 , 551-560. * 
Roe, R.A. (1983)1 Grondslagen der personeelsselectle . Assen, The 

Netherlands: Van Go r cum. 21 



Test-based Decision Hiking 
0*1 » 20 



r t • 
Taylor, H.C. & Russell, J.T. (1939). The relationship of validity 

coefficients to the practical effectiveness of tests In selection: 
\ ... 
Discussion and tables. Journal of Applied Pay_chologjy_ , 23 , 565-578. 

van den Brink, W.P. (1982). Blnomlale mode lien In de testleer . 

iProefschrift, Unlversltelt^van Amsterdam. 

van den Brink, W.P. & Koele, P. (1980). Item sampling, guessing, and 

decision-making In achievement testing. British Journal of 

• • , . 7 

Mathematical and Statistical Psychology , 33 , 104-108. % 

van der Linden, W.J. (1978/. Forgetting, guessing,' and mastery! The 

Mac ready and Dayton models revisited *nd compared with a latent trait 

approach. Journal of Educational Statistics , 305-318. 

- * 

van der Linden, W.J. (1980). Deci/sitn models for use with criterion- 

« 

referenced tests.- Applied Psychological Measurement , A, 469-492. 
van der Linden, W. J.. (1981). Estimating thr parameters of Emrick's • 

mastery testing model. A pplied Psychological Measurement , J5, 517 -530. 

(a) . 
van der Linden, W.J. (1981). Using aptitude measurements for the optimal 

assignment of subjects to treatments with and without mastery score. 

Psychometrika , 46, 257-274. (b) 
van der Linden, W.J. (1982). Zur'Sch'atzung der 'Proportion of Masters' 

In krlterlumsorlentiert^Tests. Ze ltschrlft fur Emplrlsche Pad- gogik , 

6>, 195-201. (a) 

van der Linden, W.J. (1982). Pnsslng score and length of a mastery test. 
Evaluation In Education , 2» 149^165. (b> 

van der Linden, W.J. (1982). A latent trait method for determining 
Intra judge Inconsistency In the Angoff and Nedelsky techniques of 
standard setting. Journal of Educational Measurement, 19, 295-308. (c) 



/ 



ERIC 



Test-based Decision Making 
21 



van der Linden, W.J. (1983). The .use of moment estimators for mixtures 

of two binomials witji one known success parameter* Educational and 

9 - 
Psychological Measurement A3 , 321-330. 

^•n der Linden, W.J. (1984). Some thoughts on the use of decision theory 

to set cut-off scores: Comment on de Gruljter and Hambleton. Applied 

Psychological Measurement , j[, 9-17. (a). 1 

0 

van der Linden, W.J. (1984). Over absolute en nog re^tieverei zak-slaag 

besllsslngen. Tljdschrlft voor Onderwl jsresearch . £, 243-252. (b) ' 
van der Linden, W.J. (1985). Decision theory In educational research and 

testing. In T. Huaen & T.N. Postlethvalte',, Internatlonal^encyclopedla 
* of education: Research and studies . Oxford: Pergamon Press, (a) 
van der Linden, W.J. (1985). The use of test scores foc^ classification 

decisions with threshold utility. Submitted for publication, (b) 
van der Linden,* W.J. & Mellenbergh, G.J. ^1977). Optimal Cutting scores 

using a linear lpss function. Applied Psychological Measurement , 1^, 

593-599. 

van der Linden, W.J. & Mellenbergh, G.J. (7 978). Coefficients for tests 
from' a decision theoretic point of view. Applied Psychological 
Measurement , 2^ 119-134. * 

van Naerssen R.F* (1963). Selectle .van chauffeurs . Dlssertatle, 

* 

Wolters, roningen. 
van Naerssen, R.F. (1965). Enkele eenvoudlge besli skundige toepasslngen 

} 

blj test en selectle. Nederlandse Tljdschrlft voor de Paychologle , 20 , 
364-380. (a) 

van Naerssen, R.F. (1965). Application of the decision theoretical 
approach to the selection pf drivers. In L.J. Cronbach & G.L. Gleser, 
Psychological tests and personnel de^slons (2nd ed.). Urbana, IL: 
University of tlLiaoift-ftus*-. (b) 

23 . 



Test-based Decision Making 
22 



- V 



van Naerssen, R.F. (1966). Het handhayen van eenmaal *aangenomen normen 
blj opeenvolgende objectieve toetsen. Paedagoglsche Studlen , 43 , 312- 
t 320. 

van Naerssei, R.F. (1971). Een model voor tentamens. Nederlands » 

Tljdschrlft voor de Psycholog'le . 26 , 121-132. 
van Naerssen, R.F. (1974). A mathematical model for the optimal use of 

criterion referenced tests. Nederlandfe Tljdschrlft voor de 

Psychologic 29, 341-446'. (a) 
van Naerssen, R.F. (1974). Psychometrische aspecten van de 

kernltemmethode . Nederlands Tljdschrlft* voor de Psycho log! e t 29 , 421* 

430. (b) ' * f , \ 

» 

van Naerssen, R.F. (1977). Lokale betrouwbaarheid: begrlp en ' 

operatlonallsatle. Tljdschrlft- voor Onderwl jsresearch . 7, 111-119. (a) 

van Naerssen, R.F. (1977). Grafleken voor de schactlng van de helling 
van ltemkarakterlstleken. Tljdschrlft voor Onderwl jsresearch , 2_, 193- 
201. -(b) 

■ * • 

Veldhuljzen, N.H. (1982). Setting cutting scores: A minimum Information 
approach. Evaluation In Education , _5, 141-148. ' 

VI jn, P. (1980). Prior information In linear models . Thesis, University 
of Gronlngen. 

I 

VI jn, P. & Molenaar, l.W. (1981). Robustness ^regions for dlchotomous 

decisions. Journal jbf Educational Statistics , 205-235. 
Vrljhof, B.J., Mellenbergh, G.J. & van den Brink, W.P. (1983). Assessing 

and studying utility functions In psychometric decision theory. 

Applied Psychological Measurement , 7_, 341-357. 
Wilcox, R.R. (1976). A note on the length and passing score of a mastery 

test. Journal of Educational Statistics ,. 1 , 359-364. * 

24 



Test-based Decision Making 
23 

/ 



Wilcox, R.R.* (1977). Estimating the likelihood of false-positive and 
false-negative decisions In mastery. testing: An empirical Bayes * 
Approach. J ournal of Educational Statistics , 2_> 289-307. 

Wilcox, R.R. (1978). A note on decision theoretic coefficients for 
tests. Applied Psychological Measurement, 2, 609-613. 

Wilcox, R.R. (1979), Comparing examinees to a control. Psychometrika , 
44, 55-68. (c) 



\ 



\ 

\ 



v • 1 - 

\ Test-based Decision Making 

24 





Figure 2 Flowchart of a Mastery Decision 
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Flowchart of a Placement Decision 
(Case of Two Treatments) t 




Flowchart of a Classification Decision 
(Case of Two Treatments) 
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Figure 5 . Logistic regression lines for the three treatments, 
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Table 1 

Empirical Proportion of Successes as a Function 
of Test Scfres for the Three Treatments 





roportion of Successes 
After 1 Yr. 
LVE LGE MGE ^ 



0-5 








6-10 


.897 


.575 




11 - 15 






.571 


16 - 20 


.929 


.619 




21-25 


.947 


.760 




26 - 30 


.948 


.840 


.788 


31 - 35 


.952 


.890 


.860 


36 - 40 


.959 


".930 


.920 


41 -. 45 




.960 


.960 


46 - 50 


.979 

» 


.960 


.988 


No. of Cases 


1333 


15926 


22*8 



Slope .031 .095 .099 

Intercept -.8 -1..0 -1.25 
Model Fit .641 .071 .105 
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..Table 2 

Optimal Cutting Scores Between the Treatments 
as a Function of the Utility Ratio 



Utility Optimal Cutting Score 

Ratio , 

LGE/LVE MGE/LGE 



1.00 


4 




1.02 


3 


30 


a .04 


3 


24 


1.06 


2 


19 


1.08 


2 


16 


1.10 




13 


1.12 




10 


1.14 




8 


1.16 




6 


1.18 




4 


l.fo 




1 



4* 



Note Utility ratio is defined as ( w j~ v j)( w j_i" v j-i) • Dash 
indicates cutting score outside) 1 range of test: scores. 
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